Skip to content

Conversation

@naemono
Copy link
Contributor

@naemono naemono commented Oct 22, 2025

Fixes #8429

What is changing?

This ensure that the master StatefulSets are always upgraded last when doing a version upgrade of Elasticsearch.

Validation

  1. Manual testing
  2. Unit test for behavior
  3. e2e test validating behavior

@prodsecmachine
Copy link
Collaborator

prodsecmachine commented Oct 22, 2025

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@botelastic botelastic bot added the triage label Oct 22, 2025
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
@naemono
Copy link
Contributor Author

naemono commented Oct 29, 2025

buildkite test this -f p=kind,t=TestNonMasterFirstUpgradeComplexTopology -m s=9.1.2

Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
…/cloud-on-k8s into fix-sts-upgrade-issue-recreation
@naemono
Copy link
Contributor Author

naemono commented Oct 29, 2025

buildkite test this -f p=kind,t=TestHandleUpscaleAndSpecChanges_VersionUpgradeDataFirstFlow -m s=9.1.2

Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
@naemono naemono changed the title WIP: Upgrade masters last when upgrading ES clusters Upgrade masters last when upgrading ES clusters Oct 29, 2025
@naemono naemono marked this pull request as ready for review October 29, 2025 23:21
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a non-master-first upgrade strategy for Elasticsearch clusters. The key change ensures that during version upgrades, non-master nodes (data, ingest, coordinating nodes) are upgraded before master nodes, which helps maintain cluster stability during upgrades.

  • Adds logic to separate master and non-master StatefulSets during version upgrades
  • Implements upgrade order validation to ensure non-master nodes complete their upgrades first
  • Adds comprehensive unit and e2e tests to verify the upgrade flow

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
pkg/controller/elasticsearch/driver/upgrade.go Adds check to identify new clusters vs upgrades by checking if status version is empty
pkg/controller/elasticsearch/driver/upscale.go Implements non-master-first upgrade logic with resource separation and upgrade status checking
pkg/controller/elasticsearch/driver/upscale_test.go Adds comprehensive unit test for version upgrade flow and minor formatting fixes
test/e2e/es/non_master_first_upgrade_test.go Adds e2e test that validates non-master-first upgrade behavior with a watcher

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Michael Montgomery <[email protected]>
@barkbay barkbay added the >bug Something isn't working label Nov 3, 2025
@botelastic botelastic bot removed the triage label Nov 3, 2025
@naemono naemono requested a review from barkbay November 18, 2025 00:43
@barkbay
Copy link
Contributor

barkbay commented Nov 26, 2025

I'll take another look today, sorry for the lag.

pendingNonMasterSTS = append(pendingNonMasterSTS, actualStatefulSet)
continue
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not rely on the status until the sts controller has observed the new generation.

Suggested change
if actualStatefulSet.Status.ObservedGeneration < actualStatefulSet.Generation {
// The StatefulSet controller has not yet observed the latest generation.
pendingNonMasterSTS = append(pendingNonMasterSTS, actualStatefulSet)
continue
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or:

  • use actualStatefulSets.PendingReconciliation() before that loop
  • create a common function to reuse the logic in
    func (d *defaultDriver) expectationsSatisfied(ctx context.Context) (bool, string, error) {
    log := ulog.FromContext(ctx)
    // make sure the cache is up-to-date
    expectationsOK, reason, err := d.Expectations.Satisfied()
    if err != nil {
    return false, "", err
    }
    if !expectationsOK {
    log.V(1).Info("Cache expectations are not satisfied yet, re-queueing", "namespace", d.ES.Namespace, "es_name", d.ES.Name, "reason", reason)
    return false, reason, nil
    }
    actualStatefulSets, err := sset.RetrieveActualStatefulSets(d.Client, k8s.ExtractNamespacedName(&d.ES))
    if err != nil {
    return false, "", err
    }
    // make sure StatefulSet statuses have been reconciled by the StatefulSet controller
    pendingStatefulSetReconciliation := actualStatefulSets.PendingReconciliation()
    if len(pendingStatefulSetReconciliation) > 0 {
    log.V(1).Info("StatefulSets observedGeneration is not reconciled yet, re-queueing", "namespace", d.ES.Namespace, "es_name", d.ES.Name)
    return false, fmt.Sprintf("observedGeneration is not reconciled yet for StatefulSets %s", strings.Join(pendingStatefulSetReconciliation.Names().AsSlice(), ",")), nil
    }
    // make sure pods have been reconciled by the StatefulSet controller
    return actualStatefulSets.PodReconciliationDone(ctx, d.Client)
    }

@naemono naemono requested review from barkbay and pebrc December 1, 2025 13:20
@barkbay
Copy link
Contributor

barkbay commented Dec 2, 2025

I will take another look at it today.

Edit: was not able to make it today, will do tomorrow first thing

targetReplicas := sset.GetReplicas(res.StatefulSet)

if actualReplicas < targetReplicas {
actualSset.Spec.Replicas = ptr.To(targetReplicas)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use UpdateReplicas(...) instead of updating directly Replicas, to also update common.k8s.elastic.co/template-hash

Suggested change
actualSset.Spec.Replicas = ptr.To(targetReplicas)
nodespec.UpdateReplicas(&actualSset, ptr.To[int32](targetReplicas))


if actualReplicas < targetReplicas {
actualSset.Spec.Replicas = ptr.To(targetReplicas)
if err := ctx.k8sClient.Update(ctx.parentCtx, &actualSset); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not use es_sset.ReconcileStatefulSet() instead of calling k8sClient.Update(..) directly? It already includes the call to expectations.ExpectGeneration(reconciled).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe a follow-up question is should we update UpscaleResults with the result of es_sset.ReconcileStatefulSet() to work with a consistent view in the rest of the driver logic.

// The only adjustment we want to make to master statefulSets before ensuring that all non-master
// statefulSets have been reconciled is to potentially scale up the replicas
// which should happen 1 at a time as we adjust the replicas early.
if err = maybeUpscaleMasterResources(ctx, masterResources); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err = maybeUpscaleMasterResources(ctx, masterResources); err != nil {
if err := maybeUpscaleMasterResources(ctx, masterResources); err != nil {

// Read the current StatefulSet from k8s to get the latest state
var actualSset appsv1.StatefulSet
if err := ctx.k8sClient.Get(ctx.parentCtx, k8s.ExtractNamespacedName(&res.StatefulSet), &actualSset); err != nil {
if apierrors.IsNotFound(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to understand in which cases we can get a 404 IIUC one of them is when the user is attempting to scale up the masters with a new nodeset? Let's maybe add a godoc to explain that we are only scaling up existing master statfulsets, new ones are ignored.

// The only adjustment we want to make to master statefulSets before ensuring that all non-master
// statefulSets have been reconciled is to potentially scale up the replicas
// which should happen 1 at a time as we adjust the replicas early.
if err = maybeUpscaleMasterResources(ctx, masterResources); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that calling this when len(nonMasterResources) == 0 (or more generally, when all non-master nodesets have already been upgraded?) can be slightly suboptimal.

Assuming that the initial state is:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch-sample
spec:
  version: 9.1.0
  nodeSets:
  - name: default
    config:
      node.roles: ["master", "data", "ingest", "ml"]
      node.store.allow_mmap: false
    count: 3

If we update and upgrade to:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch-sample
spec:
  version: 9.1.2
  nodeSets:
  - name: default
    config:
      node.roles: ["master", "data", "ingest", "ml"]
      node.store.allow_mmap: false
    count: 4

Then we are going to scale up the 9.1.0 statefulset, leading to the creation of elasticsearch-sample-es-default-3, but immediately in the next reconciliation we are going to delete elasticsearch-sample-es-default-3 to upgrade it to 9.1.2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My previous comment made me wonder if !isVersionUpgrade is actually the only reason we might want to reconcile everything at once.

@barkbay
Copy link
Contributor

barkbay commented Dec 3, 2025

buildkite test this -f p=kind,t=TestNonMasterFirstUpgradeComplexTopology -m s=8.15.2

func(k *test.K8sClient, t *testing.T) {
statefulSets, err := essset.RetrieveActualStatefulSets(k.Client, k8s.ExtractNamespacedName(&es))
if err != nil {
t.Logf("failed to get StatefulSets: %s", err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this test fail if we consistently get an error here? (my feeling is that it's not going to be the case because violations is always empty in that case, but maybe I'm missing something)

Copy link
Contributor

@barkbay barkbay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM, I think we need to adjust the way we scale the master nodes, also the e2e test seems broken (we create the data integrity index with no replicas, which should fail during a rolling upgrade), and may not be accurate in case of errors.

Comment on lines +96 to +100
mutated := initial.WithNoESTopology().
WithVersion(dstVersion).
WithESMasterNodes(3, elasticsearch.DefaultResources).
WithESDataNodes(2, elasticsearch.DefaultResources).
WithESCoordinatingNodes(1, elasticsearch.DefaultResources)
Copy link
Contributor

@barkbay barkbay Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without WithMutatedFrom the data integrity index has no replicas, which means that the cluster is going to be red as soon as one Pod is recreated, and the test is going to fail.

Suggested change
mutated := initial.WithNoESTopology().
WithVersion(dstVersion).
WithESMasterNodes(3, elasticsearch.DefaultResources).
WithESDataNodes(2, elasticsearch.DefaultResources).
WithESCoordinatingNodes(1, elasticsearch.DefaultResources)
mutated := initial.WithVersion(dstVersion).WithMutatedFrom(&initial)

Edit: confirmed by the e2e test I started in my previous buildkite comment

{
    "Time": "2025-12-03T09:12:11.122554149Z",
    "Action": "output",
    "Package": "github.com/elastic/cloud-on-k8s/v3/test/e2e/es",
    "Test": "TestNonMasterFirstUpgradeComplexTopology/Elasticsearch_cluster_health_should_not_have_been_red_during_mutation_process",
    "Output": "{\"log.level\":\"error\",\"@timestamp\":\"2025-12-03T09:12:11.117Z\",\"message\":\"continuing with additional tests\",\"service.version\":\"0.0.0-SNAPSHOT+00000000\",\"service.type\":\"eck\",\"ecs.version\":\"1.4.0\",\"error\":\"test Elasticsearch cluster health should not have been red during mutation process failed\",\"error.stack_trace\":\"github.com/elastic/cloud-on-k8s/v3/test/e2e/test.StepList.RunSequential\n\t/go/src/github.com/elastic/cloud-on-k8s/test/e2e/test/step.go:52\ngithub.com/elastic/cloud-on-k8s/v3/test/e2e/test.RunMutationsWhileWatching\n\t/go/src/github.com/elastic/cloud-on-k8s/test/e2e/test/run_mutation.go:77\ngithub.com/elastic/cloud-on-k8s/v3/test/e2e/es.runNonMasterFirstUpgradeTest\n\t/go/src/github.com/elastic/cloud-on-k8s/test/e2e/es/non_master_first_upgrade_test.go:76\ngithub.com/elastic/cloud-on-k8s/v3/test/e2e/es.TestNonMasterFirstUpgradeComplexTopology\n\t/go/src/github.com/elastic/cloud-on-k8s/test/e2e/es/non_master_first_upgrade_test.go:102\ntesting.tRunner\n\t/usr/local/go/src/testing/testing.go:1792\"}\n"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug Something isn't working v3.3.0 (next)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] ECK should only Update Elasticsearch StatefulSet version when attempting to upgrade the StatefulSet Pods

4 participants